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X. Impetus for the Concept of an All-^ourco Electronic Biographic In- 
toillgehoe Center - ~ """ 

There are numerous files of biographic Information maintained 
by various maabor agencies of the Intelligence Community and the 
Federal Government. Each of these files is tailored to the mission 
and responsibilities of the particular agency maintaining the file. 
Exploitation of these biographic files falls into two major cate- 
gories: (1) official U . S* Government name checks, and (2) Intelli- 
gence Community biographic research. 

The first of these uses requires rapid and thorough searching 
of all pertinent biographic files in the Federal Government complex.. 


At present this 


lis seldom done thoroughly and 


rarely done rapidly. It io characteristic of a large proportion of 


thesef 


result in a negative report, that la, no Infor- 


mation. Sana way of expediting this sort of routinized reply to a 
request by automatic or semi-automatic means would per- 
mit a tremendous saving in human resources as well as provide an im- 
proved name check service. 

The second of these uses, i.e., biographic research on person- 
alities of intelligence interest, is substantively much more complex 

In this case the requestor is looking 
for all fragments of available information (both classified and un- 
classified) about an individual or individuals. These bits and 
fragments are subjected to analysis from which process an intelligence 
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product or Judgment (decision) ensues. 

Tho division of responsibility for biographic intelligence is 
formalized in various Director of Central Intelligence Directives 
(DCIDs). "A study of tho pertinent biographic directives suggests 
that two basic policy decisions dictated administrative decentrali- 
zation in this area. One was a decision to separate biographic 
files, based on 'collateral' data, by the general occupation of the 
individuals described therein. Tho other was to separate person- 
ality files of a ' non-collateral * nature by the type of source from 
which tho information was dorivod. As a consequence of the former, 
five different agencies ore currently engaged in compiling, main- 
taining and servicing from biographic Information files. They are 
as follows* 
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The division of substantive responsibility for maintaining 
each of these files is In some cases artificial and/or overlapping. 
Duplication of effort or gaps in coverage result. In addition, pro- 
liferation of biographic reference points makes the consumers task 

1. The Biographic Register Information System » System Description . 

^eBitral intQll , igencQ r "Ag , en'cy, Office' of Central Koferenoo, December I960. 
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extremely difficult* i*ot only should he bo aware of all tho various 
files ho can exploit, he also needs to know whether there files are 
redundant or complementary in order to place his requests cost ex* 
pcditloualy* 

Most agencies have frequent need to search the files of at 
least one other agency maintaining biographic information* Fre- 
quently euch inter-agency nas searches result in a negative re- 
port (no information). But in order to deliver the negative re- 
port, nearly as much file searching is required as in servicing a 
positive report* Human resources are sadly depleted by this 
constant and significantly large demand on human file searching 
capabilities. 

The advantages of a central reference point for requests In- 
volving biographic information have boon recognised by cons ’-mar a of 
this kind of information for a long time* Until recently tho at- 
tendant problems involved in processing such a large volume of di- 
verse information encompassing all levels of security classifica- 
tion made the task virtually impossible* However, with tho advent 
of new electronic data processing equipment permitting comprehensive 
indexing in depth end rapid retrieval of large volumes of informa- 


tion, the concept of an all-source electronic biographic intelli- 
gence center begins to fall in' the realm of visionary planning. 

This paper addresses itself to several dimensions of this con- 
cept that would need to be considered by systems planners and de- 
signers of such a center* The first of these considerations con- 
cerns a survey of extant biographic files and a decision as to 
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which of those should be included in an all-aourco system, *> ^ " 

{ d* 

A second conoidoration deals with operational and substantive 
advantages and disadvantages of the concept of an all-source bio- 
graphic center# 

Tho third consideration addrossos itself to con amor require- 
ments for use of such a system# Development of these requirements 
establishes tho guidelines for design of indexing schemes, file 
. organisation, types of ratrleval capability, interface constraints, 
etc# 

A fourth consideration outlines the problem areas that trill 
inevitably arise in tho design of such a system and suggests possi- 
ble ways for coping with than. 

II# 


25X1 


Assigned primary responsibility for biographic Information 
on scientific and technical personalities, and significant eco- 
nomic personalities not covered by the Department of State*' 

25X1 
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Maintains biographic information files on Agitprop offi- 
cials, party theoreticians, academicians, and Soviet commen- 
tators (includes scattered information on soma East European 
satellite and Chinese Communist propagandists); maintains a 
catalog of Soviet radio comments tor a with each one's specialty 
noted; maintains a cord fils listing radio lectures by candi- 
date academicians in the social sciences? maintains a card 
file of authors of articles that have appeared in KOMMUKIST, 
top theoretical Journal of the CPSU. 

5 . gr/ocr/dd/i 

Assigned responsibility for maintaining photographic 
personality files on people of intelligence significance. 

6 . os/dd/s 

Assigned responsibility for maintaining biographic files 
on all persons for whom a security investigation has been 
initiated by CIA. 

£. State (Biographic Information Division, Office of Functional 
and Biographic Intelligence, Bureau of Intelligence end 
Research) 

Assigned primary responsibility for political, social and 
cultural personalities, and economic personalities of political 
significance.' 

C. Aray/Navy/Air 

Assigned responsibility for maintaining a biographic 
intelligence file system on foreign military personalities of 
their counterpart services. 
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E, FBI 

Assigned responsibility for maintaining Inter nal security 
biographic information files on all U«5. citizens, foreign 
visitors to the U.3., and all persons of known or suspect 
hostile intent to the U.S. 

25X1 


F* ABC (should it be included?) 

O, naturalization and Immigration Service (should it bo included?) 

Ill* All-Sourco Biographic Center - Advantages versus Disadvantages 
A* Advantages 

1* The most obvious advantage of an all-source biographic 

center would be a single, ^central reference point for all \- 
consumers of biographic inf oraation in t ho Federal Govern- 
ment fflvd tha Intelligence Community* A requestor would 
place his request only oncoj the response to his request 
would include all information available to him throughout 
tho Federal Government and/or the Intelligence Community* 

2* A single biographic canter would eliminate or at least re- 
duce overlapping efforts in several agencies to index and 
control the sane biographic information. It would also 
point up gaps in biographic coverage. 

3* Time consumed in requesting documents and/or enclosures 
from other agencies would be eliminated* 
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Disadvantages 

1, The centralization of all biographic files in one location 
would preclude the maintenance of biographic information 
repositories in any of the agencies participating in the 
central system, Any duplication of extant files would be 
both inefficient and uneconomical* The importance of this 
point ie that centralization would deprive each partici- 
pating agency of autonomous control over its biographic 
reference function for Internal support. They would be 
forced to depend upon the central repository. Unless , 

i' 

the central facility really provided timely and ccmpro- # 
honsive reference service, the participating agency would 



find Itself with no alternative capability to retrieve 
biographlo infomation. (One way to safeguard against 
this possibility might be to only centralize th^^exe£ 
to extant biographic files* The actual repositories of 
biographlo Information would remain with the partici- 
pating agencies who vould then have the capability to uti- 






lize their own files for internal support aa well as 


profit from the availability of a central biographlo refer- 
ence service for external requests.) 

2. In order for a central biographic reference system to bo 
effective, the index to file holdings (whether those files 
are centrally located or decentralized) would need to be 
both comprehensive in fields of information controlled and 


$m 
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omni sclent in depth of indexing employed. Tho quality of 
reference service provided by a centralized system would b® 
dependent upon the speed and accuracy *1 th which partici- 
pating agencies supplied new, revised, or purged index 
entries to the central system* If the central system could 
not provide satisfactory reference support, oach agency 
would rely on its own biographic holdings, vitiating any 
potential advantages a central reference facility could 
provide* li amber agencies would then have returned to the 
old decentralized systaa but with the appendage of an ex- 
pensive, emasculated central facility ignored by the con- 
sumers it was designed to serve* 

3« Unless operationally feasible and foolproof methods for 
matching the security classification of documents to the 
seourlty clearance of the requester can be designed, ef- 
fective central reference service would hang up on securi- 
ty screening delays before release of materials to re- 
questers * 

U* In order for an electronic central reference facility to 


provide effective service, a uniform m achlno-readable 
language must bo developed fbr indexing and retrieval* A 
requester unfamiliar with this language and remotely 

located from this central reference point nay have diffi- 

< ■ 

culty in selecting the best way to phrase his request* 

, This communication barrier between the requestor and the 
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[and the central biographic index may require the as- 
sistance of intermediaries who can translate tho ro— 
quos tor's question into appropriate terras for precise 
machine searching* Biographic specialists or reference 

technicians may be necessary to bridge this gap. 

$. Centralizing and standardizing Indexing control over 

biographic holdings may work a hardship on certain 

agencies who ordinarily would only superficially index 

/ 

their biographic material. Forcing them to conform to 
a standard indexing in depth procedure would create an/ 7 
extra workload they may not be willing to assume./ The 
alternative is to have the indexing funotion done at the 
central facility. 


a. 

// 

V./ 



r // 

/ r 
Av/ 
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J V j 

r 


(jrLui 


Approved For Release 2004/01/15 : CIA-RDP79M00097A00020001 0005-5 



Approved For Release 2004/0§|£^U-RDP79M00097A0002000100Qg t 5 
Designing tho System 

A. Determining tha Limits of the Data Base 

Tho first step in designing an automated central biographic 
reference system requires a determination of 1) all extant bio- 
craphio files, 2) tha size and nature of the indexes to these 
files, and 3) characteristics of tha document holdings thesn- 
e elves (e«g«, dossiers, hard copy bulletins and reports, aper- 
ture cards, microfilm, positive and negative photos or film 
stripe etc*), 

B* Determining Consumer Requirements 

Tho next step requires an investigation of how consumers 
would optimally use these files* All classes of requests must 
be specified and the volume of requests in each class determined* 
C. Basic Factors in Any Storage and Retrieval System 

Any information storage and retrieval system must consider 
several problem areas in the design phase* 1) the information 
control problem, 2) the physical storage problem, and 3) the 
retrieval problem* In the information control area, several 
basic questions must bo answered* 

1* Is the system to bo an I nformation storage and retrieval 

system or a document storage and retrieval systea or both ? U 
2* What categories of information are important enough to 
control for every document indexed? 

3« What type of indexing scheme(s) i a/ are to be employed? 

| (hiorarchioal codes, claaslflcatory or taxonomic codes, 

SECREI 
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coordinate indexing, unit eras, key words, a combination of 
those, or some new technique) 

U. How much modifying information is to be included in the 
index? (linkage analysis, modifiers, descriptors, action 
and use codes, directional indicators, tags, cross refer- 
ences, see also entries, etc*) 

5. If alphanumeric codes ore used, should the system he able 
to convert then to their dear text equivalents? 

6. How does the system provide for consistent indexing of the 
same kinds of Information? (inter and intra-coder relia- 
bility) 

7. How much of the indexing can be done automatically, i.e., 
by computer dictionary look-up procedures? 

8. How is name verification to be accomplished? 

9 * When does the name verification process occur? At time of 
input or at time of servicing a request? 

10. Can the input format be controlled at the source originating 
the document? 

11. If input is automatic, can the input processing equipment 
perform "legality checks" on errors in format? 

The physical storage problem requires answers to these basic 

questions* 

1. How should the document holdings be stored? (hard copy, 
aperture cards, document images, microfilm, magnetic tape, 
punched paper tape, punched cards, random access storage, etc.) 




tv 





m 
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2* If the Index is to be mechanized. In what fora vdll It be 
maintained? (punched cards, magnetic tape, random access 
storage, core memory, photoacoplc disc, etc.) 

3* In that form at (a) Is the Indexed information to be stored? 

U* In what sequence(B) is the indexed information to be stored? 

5* Should all index entries be stored in one file or par- 

9 

titioned among several files? , F v'jf- r 

A 

6* What kinds of physical security are required to protect the 
data base and the index from getting into the hands of un- 
cleared personnel? 

7* What kind of file backup is required to protect against 
loss of the data base and the index by catastrophe? 

8* Is the data base end index stored in a form (machine reada- 
ble or otherwise) that is compatible with users* equipment 
and/or needs? (the interface problem) 

9m Is it desirable to purge the file of old or obsolete infor- 
mation at periodic intervals? 

The retrieval problem requires answers to these questions) 

1* Coes the index storage sequence(s) permit rapid retrieval 
of the major information categories? 

2. Can the index entries selected by a search be listed in 
dear text and in a variety of sequences for easier use 
by the requestor? 

3* Once a search of the index is made, are the documents 
referenced easy to locate? 


xcCREI 

Approved For Release 2004/01/15 : CIA-RDP79M00097A00020001 0005-5 


U-" 



Approved For Release 2004/OtmJ2CJA-RDP79M00097A00020001Q£$ ; 5 


U* Docs the index system lend Itself to data extraction and 
refinement? 

£. Can a document or information be recovered from the file 
without removing it from the file thus making it unavaila- 
ble for other requests? 

6» Is it desirable to incorporate into tho index evaluative 
comments from previous requesters about quality of tho 
information or its interpretation so that a new requester 
can take advantage of previous analysis in his area of 
interest? (evaluative feedback to the system) 

File Organization 

1. Systems of File Organization 

a* The Unit Record File . All the information indexed 
from one document constitutes a single index entry 
or unit record. Depending upon the physical medium 
employed to store the index entry, this unit record 
may be bound by equipment limitations. Punched cards 
are limited to 60 columns of information which must 
be allocated on the card to fixed length fields for 
ease of search, selection, and sequencing by EM e- 
quipmont. Data processing computers permit variable 

* length unit records with both fixed and variable 
length fields. 

• b. The Unit Record File with Internal Coordinate Indexing . 

All the information indexed from one document constitutes 
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a single Index entry or unit record. The Index Is 
stored In sequence by a primary field. A second 
index is constructed on a secondary field indicating 
vhich entries In the primary sequence contain each 
kind oflnformation displayed in the secondary field. 

An example of this kind of file organization might be 
a file of biographic information about all scientists 
attending International meetings. The primary sequence 1 
of this file would be maintained by name. A second 
index could be constructed relating to a secondary 
field, say, institution employed by. The names of all \ 
institutions occurring in the primary index would 
appear in the secondary index in alphabetic order. 

Along with each institution in the secondary index 
all entries in the first Index mentioning that insti- 
tution would be cross referenced. 

25X1 A . I _ 

C. The Multi-segment File with Unit Record Input and In- 
ternal Cross Referencing . All the information Indexed 
from one document constitutes a single unit record for 
input* However, the format of the unit record is 
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partitioned into segments, each controlling different 
kinds of information* As the unit record is road 
through the input processing equipment, it is fraction- 

\ 

iaod into these segments and stored in separate in- 
ternal files* The processing equipment provides in- 
ternal cross referencing addresses so that the orig- 
nal unit record cm be reassembled or segments can be 
related* This type of file organisation facilitates 
retrieval when unit records are long and variable and 
when moat requests involve searching only part of the 
fields in & record. 

d* The Multi-level File . This file structure is organized 
into several levels like a. pyramid. The first or top 
level usually contains only timely or critically im- 
portant information an d is stored in high speed random 
access memory* The second level contains additional 
information elaborating on what is contained in the 
first level file and is usually stored in random 
access memory or on magnetic tape. The third level 
contains further pertinent but loss essential Infor- 
mation such as historical background, related docu- 
ments, previous patterns, evaluations by other con- 
sumers, summaries, etc* The three file levels can be 
tied together by cross referencing so that a requester 
can move down through the file structure in as much 
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depth as ho requires. This kind of file organization 
la ©specially suited to manHuachine systems whore the 
analyst interrogates the file* receives an answer, I V 

generates a new question, re-lnterrogates the file, etc. 
until he has satisfied all leads. The thin level file 
la mosteoonoraically stored in document imago form ouch 
aa on Mlnlcards, aperture cards, film strips, or micro- 
film. 

2. Factors to be Considered in Determining File Structure and 
Organization 

a. Scope of the information to be controlled (breadth of 
the data base) 

b. Volume of information to be controlled 

c. Rate of retirement of purged information from the file 

d. Depth of indexing required 

«• Maximum physical length of a unit record 
f. Average physical length of a unit record 
fi. Variety of input formats required 

h. Access time to information stored in different kinds <f. 
of memory (core, RAMAC, magnetic tape, magnetic drum, 
document Image, etc.) * 

1. Sophistication and speed of search logic 

Relative costs j 

■ I 

i 

3* Desirable Features in File Structure and Organization 

a. Simplicity end ease of input (automatic input processing 
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being the ultimate goal) 

b. Simplicity and accuracy of internal editing and filing 

c. Search flexibility 

d* Speed of search 

e. Quality of search (few "false drops", no misses) 

f. Adaptability to change (dynaalo and fluid file design) 

g. Standardised indexing and request language 

h* Incorporation into the file of evaluative feedback from 
users of the system* 

An All-Source Electronic Biographic Intelligence Center* Some 
Tentative Charts 

Figure 1 is a graphic representation of the concept of a 
central Biographic Intelligence Center serving the needs of 
member agencies of the Intelligence Community and the Federal 
Government. 

Figure 2 concerns itself v&th the relationships between 
the logical components of an all-source electronic biographic 
information system. This figure is designed to suggest possi- 
ble kinds of input to and output from a central biographic 
index system. The actual document files may be decentralized. 

Index entries prepared by participating agencies may enter 
the system by direct communication link from the field station 
reporting or by punched cards prepared in the Washington area 
or elsewhere. stored computer program would process the 
input into the various index files, performing necessary 

SECRET 
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Information control operations such as checking the legality 
of formats, augmenting tho indexes by addition of information 
from dictionaries and gazetteers, and providing internal cross 
referencing for retrieval purposes. 

Reference service fro®, tho central system could take the 
fora of lists of ^document references and where they can be 
located, listings of selected index entries arranged in vari- 
ous sequences, selected information on magnetic tape or 
punched cards for U3e in another data processing system, or 
paper tape for transmitting nano check responses to the field 
via direct communication link. 
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Figure 2 


Logical Components of An Electronic 
Biographic Intelligence Information System 
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All— Souroo Electronic Biographic Intelligence Center; Problem Areas 

A. Security Safeguards 

X. Physical Site Security . The computer site should bo es- 
tablished with a security environment corresponding to the 
highest level of security classification included in the docu- 
ment holdings* All operating and maintenance personnel should 
have this lovol of clearance. This arrangement would avoid the 
daily security problems conneoted with making sure that all 
input-output devices, core memory, ancillary storage, buffer units, 
end console had been cleared of classified information after 
each scheduled operation* 

2. Communications Security * The configuration of on-line and 
off-line equipment at the computer site plus any direct com- 
munication links with satellite users would have to be made 
secure against compromising emanations from line or air trans- 
missions* This would include any Flexa writer- type equipment 
used in the field to prepare input to the central system. 

3* Top Secret Control * Any information entering the system 
with unknown security classification should bo treated as 
though it had the highest classification until otherwise con- 
firmed. 

The problem of how to treat a collection of top secret 
documents all contained on one reel of magnetic or paper tape must 
be resolved* The clerical burden of logging in and out all of 
these individual documents everytiae a reel of tape is moved 



Approved For Release 2004/01/15 : CIA-RDP79M00097A00020001 0005-5 



Approved For Release 


-RDP79M00097A00020001 0005-5 

-23- 


from one person’s jurisdiction to another's must be avoided and 
soeqq other less cumbersome fora of security control provided* 

4. Personnel Security * 

a. Personnel Operating the Center. All operating, mainte- 
nance, and necessary support personnel assigned to the 
Center should have the highest level of security classi- 
fication used in the Center. 

b. Consumers. A foolproof system for matching the security 
level of information provided to a consumer with the 
consumer's security clearance must be assured by 
automatic means. This certainly involves including a 
field for security classification in the format(s) for 
each index entry so that the computer programs used in 
searching the index will select only that information 
which the requester is entitled to see. 

If there is additional information in the system of 
a higher security classification than the requester is 
entitled to see, a supervisory person with the higher 
security clearance could be informed of this fact. He 
could then decide if the person making the request was 
being deprived of vital Information he needed to know 
and, consequently, this supervisory person could taka 
steps to initiate the higher level security clearance 
needed by the requester. 
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B. Variant Input Languages 

When the origination of biographic information cannot be 
controlled at the source. It may be received by the system in a 
variety of languages. Some solution to this problem must be 
found. It probably is not feasible to translate all foreign 
language document receipts into English before processing them 
Into the system. Perhaps a better solution vould be to prepare 
the index entry in English but leave the document or source 
referenced as it was received. This leaves the burden of 
translation on the requester before he can exploit the informa- 
tion contained in the document or source. It also vould require 
indexers with multi-language capability. 

C. Variant Hame Spellings 

The problem of when and how to verify the identity of a 
person named in a document must be resolved. In some systems 
every name is verified before It enters the 
system. This is required so that the document or source can 
be filed in the dossier or section of a card file along with 
other references to the same individual. At the time of 
servicing a request for this individual, retrieval is a fairly 
simple task. Once the proper dossier number is determined by 
a biographic locator system, all references to the individual 
are already physically assembled in one place. 
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Systema that incorporate name verification at the input 
phase may create unnecessary vork. Much of the information ia 
a file my lie dormant and never he of interest Id requesters. 
In time, this material would be retired from the active file 
because of date of information, low circulation rate, or some 


.r 




other criterion^ The time spent in verifying who the individual^' 
la has been wasted. 

Other systems reserve name verification to the output 
phase (e.g.. 


The noma in a document or source is not 
verified until it is of interest to some requester. Thus, 
documents which have no interest among consumers get the minimum 
amount of input processing required to get them into the system. 
This appears to be a more economical way to handle the variant 
name problem providing that the verification process at the 
output stage is not so time-consuming that poor reference 
service results. Time-consuming name verification at output 
can be alleviated by storing index entries for biographic 
references with similar names in common groups. Establishing 
these name groups requires intensive study and understanding 
of transliteration of names from one language to another, common 
misspellings or corruptions of names, alternative spellings, etc. 
But once inclusive name groups have been established, servicing 
a request involves only the examination of a manageable group 
of biographic index references rather than the entire file. 
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Debilitation of the Central System 

Unless effective administrative controls are established to 
assure timely and efficient support to the Central System by 
participating agencies, it will atrophy from lack of comprehen- 
sive input and, consequently, lack of use. 
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